-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MariaDB statistics support #19408
Conversation
c5df09c
to
4e50d1f
Compare
4e50d1f
to
8c1747a
Compare
Change `TestingMySqlServer.execute(String sql)` and `TestingMySqlServer.execute(String sql)` to run in context of the default database ('tpch'), simplifying usage.
Get table statistics for MariaDB tables.
8c1747a
to
fcd57b9
Compare
( just rebased, after #19391 merged ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
plugin/trino-mariadb/src/main/java/io/trino/plugin/mariadb/MariaDbClient.java
Show resolved
Hide resolved
if (!columnIndexStatistics.nullable()) { | ||
double knownNullFraction = columnStatisticsBuilder.build().getNullsFraction().getValue(); | ||
if (knownNullFraction > 0) { | ||
log.warn("Inconsistent statistics, null fraction for a column %s, %s, that is not nullable according to index statistics: %s", table, columnName, knownNullFraction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean "column %s.%s"? otherwise it would log foo, bar
where foo
is table and bar
is column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i admit having copied this from MySQL's io.trino.plugin.mysql.MySqlClient#updateColumnStatisticsFromIndexStatistics
yes, we can improve both places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take care of this.
// row count from INFORMATION_SCHEMA.TABLES may be very inaccurate | ||
rowCount = max(rowCount, columnIndexStatistics.cardinality()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code seems to assume inaccuracy is always towards the lower values - is this a fair assumption to make?
i.e. rowCount is always lower than column index cardinality value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think it is. I think the estimate may be well under-estimated and also over-estimated.
However, row count and NDV are related. We meed to do something (either way).
I copied the current approach from MySQL connector stats code (where it's probably where i implemented it earlier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, looking at a few other places the logic is that overestimating stats is better than underestimation because underestimation can actually lead to objectively bad plans.
e.g. broadcast of a table that is too large to broadcast is probably worse than a hash join even if the table could be broadcast.
Marking as resolved.
AND SUB_PART IS NULL -- ignore cases where only a column prefix is indexed | ||
AND CARDINALITY IS NOT NULL -- CARDINALITY might be null (https://stackoverflow.com/a/42242729/65458) | ||
AND CARDINALITY != 0 -- CARDINALITY is initially 0 until analyzed | ||
GROUP BY COLUMN_NAME -- there might be multiple indexes on a column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to port this to MySQL as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was copied from MySQL and slightly adapted.
i think here AND CARDINALITY != 0
line was added (based on observation with MariaDB)
i suspect MySQL benefits from AND CARDINALITY IS NOT NULL
line and maybe it's redundant for mariadb. it is very hard to test this, so i retained old condition and just added a new one
someone would need to observe what mysql is doing to determine whether AND CARDINALITY != 0
would make sense there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take care of this.
...trino-mariadb/src/test/java/io/trino/plugin/mariadb/BaseMariaDbTableIndexStatisticsTest.java
Show resolved
Hide resolved
getQueryRunner()::execute, | ||
"test_numeric_corner_cases_", | ||
ImmutableMap.<String, List<String>>builder() | ||
// TODO Infinity and NaNs not supported by MySQL. Are they not supported in MariaDB as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd look at type mapping tests to find the answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
@homar please follow-up here. See all // TODO Infinity and NaNs not supported by MySQL
occurrences in the code.
|
||
@Test | ||
@Override | ||
public void testStatsWithPredicatePushdownWithStatsPrecalculationDisabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one case to test here should also be when multi-column indexes exist.
I'm thinking of case where cardinality of a multi-column index can be > actual row count. (I don't know what "CARDINALITY" refers to in MariaDB case so maybe this is an impractical situation to test for).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of case where cardinality of a multi-column index can be > actual row count.
that's possible because cardinality is just an estimate.
one case to test here should also be when multi-column indexes exist.
good point, we should have a test for this.
For both MySQL and MariaDB.
@homar can you please follow-up?
Description
Get table statistics for MariaDB tables.